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DATA RECORDING MEDIUM AND DATA REPRODUCTION SYSTEM 
Japanese Unexamined Patent No. Sho-63-191372 
Laid-open on: August 8, 1988 - 
Application No. Sho-62-219962 
Filed on: September 2, 1987 
Inventor: Hiroshi SEKIGUCHI 
Applicant: Kanars Data Corporation 
Patent Attorney: Yoshiki HASEGAWA et al. 

SPECIFICATION 

1. TITLE OF THE INVENTION 

Data Recording Medium and Data Reproduction System 

2. WHAT IS CLAIMED IS; 

1. A data recording medium for recording speech sound data, 
at least comprising: 

a first region for recording a first speech sound data 
sequence divided into a plurality of sections; 

a second region for recording a second speech sound data 
sequence which has contents corresponding to said first speech 
sound data sequence, which is made up of different speech sound 
data and which is divided into a plurality of sections 
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corresponding to the sections of said first speech sound data 
sequence; and 

a third region for recording recorded position 
identification data which indicates the recorded positions of 
said first and second speech sound data sequences, respectively, 
using each of said sections. 

2. The data recording medium according to Claim 1, wherein 
said third region is a region for collectively recording at 
least a portion of said recorded position identification data 
at a predetermined position as a directory. 

3. The data recording medium according to Claim 1, wherein 
said third region is a region for recording, as identification 
data, at least portions of said recorded position 
identification data at positions adjacent to the respective 
recorded positions of said first and second speech sound data 
sequences. 

4. The data recording medium according to Claim 1, wherein 
each of said sections in said first and second regions is 

made up of one, or more, small sections, and 

said third region includes recorded position identification 
data for each of said small sections. 

5. The data recording medium according to Claim 1, wherein 
said data recording medium is a disk-type recording medium such 
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as a CD. 

6. The data recording medium according to Claim 1, wherein 
said data recording medium is a tape-type recording medium such 
as a DAT . 

7. A data recording medium for recording speech sound data, 
at least comprising: 

a first region for recording a first speech sound data 
sequence divided into a plurality of sections; 

a second region for recording a second speech sound data 
sequence which has contents corresponding to said first speech 
sound data sequence, which is made up of different speech sound 
data and which is divided into a plurality of sections 
corresponding to the sections of said first speech sound data 
sequence; 

a third region for recording a third speech sound data 
sequence which has contents corresponding to said first and 
second speech sound data sequences, which is made up of 
different speech sound data and which is divided into a 
plurality of section groups, each group made up of one, or more, 
sections from among said sections; and 

a fourth region for recording recorded position 
identification data which indicates the recorded positions of 
said first and second speech sound data sequences, respectively, 
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using each of said sections and for recording recorded position 
identification data which indicates the recorded positions of 
said third speech sound data sequence using each of said section 
groups . 

8. The data recording medium according to Claim 7, wherein 
said fourth region is a region for collectively recording at 
least a portion of said recorded position identification data 
at a predetermined position as a directory. 

9. The data recording medium according to Claim 7, wherein 
said fourth region is a region for recording, as identification 
data, at least portions of said recorded position 
identification data at positions adjacent to the respective 
recorded positions of said first, second and third speech sound 
data sequences. 

10. The data recording medium according to Claim 7 , wherein 
each of said sections in said first and second regions is 

made up of one, or more, small sections, and 

said fourth region records recorded position identification 
data for each of said small sections. 

11. The data recording medium according to Claim 7, wherein 
said data recording medium is a disk-type recording medium such 
as a CD. 

12. The data recording medium according to Claim 7, wherein 
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said data recording medium is a tape-type recording medium such 
as a DAT . 

13. A data recording medium for recording speech sound data 
and text data, at least comprising: 

a first region for recording a first speech sound data 
sequence divided into a plurality of sections; 

a second region for recording a second speech sound data 
sequence which has contents corresponding to said first speech 
sound data sequence, which is made up of different speech sound 
data and which is divided into a plurality of sections 
corresponding to the sections of said first speech sound data 
sequence; 

a third region for recording a text data sequence which is 
made up of text data having contents corresponding to said first 
and second speech sound data sequences and which is divided 
into a plurality of sections corresponding to the sections of 
said first speech sound data sequence; and 

a fourth region for recording recorded position 
identification data which indicates the respective recorded 
positions of said first and second speech sound data sequences 
and a text data sequence using each of said sections. 

14 . The data recording medium according to Claim 13,, wherein 
said fourth region is a region for collectively recording at 
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least a portion of said recorded position identification data 
at a predetermined position as a directory. 

15 . The data recording medium according to Claim 13, wherein 
said fourth region is a region for recording, as identification 
data, at least portions of said recorded position 
identification data at positions adjacent to the respective 
recorded positions of said first and second speech sound data 
sequences as well as a text data sequence. 

16. The data recording medium according to Claim 13, wherein 
/ each of said sections in said first , second and third regions 

is made up of one, or more, small sections, and 

said fourth region is a region for recording recorded 
position identification data for each of said small sections. 

17 . The data recording medium according to Claim 13, wherein 
said data recording medium is a disk-type recording medium such 
as a CD. 

18 . The data recording medium according to Claim 13, wherein 
said data recording medium is a tape-type recording medium such 
as a DAT. 

19. A data reproduction system for reproducing speech sound 
data that has been recorded on a medium in advance, wherein 

said speech sound data at least includes a first speech sound 
data sequence, divided into a plurality of sections; a second 
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speech sound data sequence which has contents corresponding 
to said first speech sound data sequence, which is made up of 
different speech sound data and which is divided into a 
plurality of sections corresponding to the sections of said 
first speech sound data sequences; and recorded position 
identification data which indicates recorded positions of said 
first and second speech sound data sequences, respectively, 
in said medium using each of said sections, and 
the data reproduction system at least comprises: 
the first step of reading out speech sound data of a section 
in said second speech sound data sequence corresponding to the 
section in said first speech sound data sequence during 
reproduction from said medium based on said recorded position 
identification data when a reproduction instruction of said 
second speech sound data sequence is inputted during the 
reproduction, or after an interruption, of said first speech 
sound data sequence and of reproducing the read out speech sound 
data; and 

the second step of reading'out speech sound data of a section 
in said first speech sound data sequence corresponding to the 
section in said second speech sound data sequence during 
reproduction from said medium based on said recorded position 
identification data when a reproduction instruction of said 
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first speech sound data sequence is inputted during the 
reproduction, or after an interruption, of said second speech 
sound data sequence and of reproducing the read out speech sound 
data. 

20. The data reproduction system according to Claim 19, 
wherein said first step is the step of shifting back the 
position of the speech sound data in said second speech sound 
data sequence that has been read out from said medium by an 
instructed amount in units of a notch with respect to said 
sections when a return instruction is inputted at the time of 
input of said reproduction instruction. 

21. The data reproduction system according to Claim 19, 
wherein said second step is the step of shifting back the 
position of the speech sound data in said first speech sound 
data sequence that has been read out from said medium by an 
instructed amount in units of a notch with respect to said 
sections when a return instruction is inputted at the time of 
input of said reproduction instruction. 

22. A data reproduction system for reproducing speech sound 
data that has been recorded on a medium in advance, wherein 

said speech sound data at least includes: a first speech 
sound data sequence divided into a plurality of sections; a 
second speech sound data sequence which has contents 
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corresponding to said first speech sound data sequence, which 
is made up of different speech sound data and which is divided 
into a plurality of sections corresponding to the sections of 
said first speech sound data sequence; a third speech sound 
data sequence which has contents corresponding to said first 
and second speech sound data sequences, which is made up of 
different speech sound data and which is divided into a 
plurality of section groups, each group made up of one, or more, 
sections from among said sections; and recorded position 
identification data which indicates the recorded positions of 
said first and second speech sound data sequences, respectively, 
in said medium using each of said sections and which indicates 
the recorded position of said third speech sound data sequence 
in said medium using each of said section groups, and 
the data reproduction system at least comprises: 
the first step of reading out speech sound data of a section 
in said second speech sound data sequence corresponding to the 
section in said first speech sound data sequence during 
reproduction from said medium based on said recorded position 
identification data when a reproduction instruction of said 
second speech sound data sequence is inputted during the 
reproduction, or after an interruption, of said first speech 
sound data sequence and of reproducing the read out speech sound 



-9- 



data; 

the second step of reading out speech sound data of a section 
in said first speech sound data sequence corresponding to the 
section in said second speech sound data sequence during 
reproduction from said medium based on said recorded position 
identification data when a reproduction instruction of said 
first speech sound data sequence is inputted during the 
reproduction, or after an interruption, of said second speech 
sound data sequence and of reproducing the read out speech sound 
data; and 

the third step of reading out speech sound data of a section 
group in said third speech sound data sequence corresponding 

to the section in said first or second speech sound data 

< 

sequence during reproduction from said medium based on said 
recorded position identification data when a reproduction 
instruction of said third speech sound data sequence is 
inputted during the reproduction, or after an interruption, 
of said first or second speech sound data sequence and of 
reproducing the read out speech sound data. 

23. The data reproduction system according to Claim 22, 
wherein' said first step is the step of shifting back the 
position of the speech sound data in said second speech sound 
'data sequence that has been read out from said medium by an 
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instructed amount in units of a notch with respect to said 
sections when a return instruction is inputted at the time of 
input of said reproduction instruction. 

24. The data reproduction system according to Claim 22, 
wherein said second step is the step of shifting back the 
position of the speech sound data in said first speech sound 
data sequence that has been read out from said medium by an 
instructed amount in units of a notch with respect to said 
sections when a return instruction is inputted at the time of 
input of said reproduction instruction. 

25. The data reproduction system according to Claim 22, 
wherein said third step is the step of shifting back the 
position of the speech sound data in said third speech sound 
data sequence that has been read out from said medium by an 
instructed amount in units of a notch with respect to said 
section groups when a return instruction is inputted at the 
time of input of said reproduction instruction. 

26. A data reproduction system for reproducing speech sound 
data and text data that have been recorded on a medium in advance, 
wherein 

said speech sound data and text data include at least: a 
first speech sound data sequence divided into a plurality of 
sections; a second speech sound data sequence which has 
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contents corresponding to said first speech sound data sequence, 
which is made up of different speech sound data and which is 
divided into a plurality of sections corresponding to the 
sections of said first speech sound data sequence; a text data 
sequence divided into a plurality of sections corresponding 
to the sections of said first speech sound data sequence which 
is made up of text data having contents corresponding to said 
first or second speech sound data sequence; and recorded 
position identification data showing the recorded positions 
of said first and second speech sound data sequences as well 
as a text data sequence, respectively, in said medium using 
each of said sections, and 

the data reproduction system comprises at least: 
the first step of reading out speech sound data of a section 
in said second speech sound data sequence corresponding to the 
section in said first speech sound data sequence during 
reproduction from said medium based on said recorded position 
identification data when a reproduction instruction of said 
second speech sound data sequence is inputted during the 
reproduction, or after an- interruption, of said first speech 
sound data sequence and of reproducing the read out speech sound 
data; 

the second step of reading out speech sound data of a section 
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in said first speech sound data sequence corresponding to the 
section in said second speech sound data sequence during 
reproduction from said medium based on said recorded position 
identification data when a reproduction instruction of said 
first speech sound data sequence is inputted during the 
reproduction, or after an interruption, of said second speech 
sound data sequence and of reproducing the read out speech sound 
data; and 

the third step of reading out text data of the corresponding 

» 

section in said text data sequence from said medium based on 
said recorded position identification data during the 
reproduction of, or after an interruption of, said first or 
second speech sound data sequence and of displaying the read 
out text data. 

27. The data reproduction system according to Claim 26, 
wherein said first step is the step of shifting back the 
position of the speech sound data in said second speech sound 
data sequence that has been read out from said medium by an 
instructed amount in units of a notch with respect to said 
sections when a return instruction is inputted at the time of 
input of said reproduction instruction. 

28. The data reproduction system according to Claim 26, 
wherein said second step is the step of shifting back the 
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position of the speech sound data in said first speech sound 
data sequence that has been read out from said medium by an 
instructed amount in units of a notch with respect to said 
sections when a return instruction is inputted at the time of 
input of said reproduction instruction. 

3. DETAILED DESCRIPTION OF THE INVENTION 

[Field of the Invention] 

The present invention relates to a data recording medium 
such as a CD-ROM, a DAT, or the like, in which speech sound 
data, and the like, is recorded, and relates to a data 
reproduction system for reproducing speech sound data, and the 
like, that has been recorded on such a medium in advance. 

[Prior Arts] 

A variety of speech sound data recorded on a medium such 
as a cassette tape have been provided for the purpose of 
self-study of a language such as English conversation, the 
practice of reciting Chinese poems, the study of law, etc. Here, 
a cassette tape for the self-study of English conversation will 
be described as an example. A sequence of English vocal sound 
(speech sound data), for example, is recorded on a cassette 
tape (or a record) ; and such a tape is combined with text books 
as teaching materials and is used by the learner for self- 
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study according to the prior art. Thus there are a variety of 
levels starting from novice to expert. 

It is possible to listen to the same portion of the tape 
again by slightly rewinding it, in a case where the portion 
cannot be heard while listening to the tape according to the 
prior art. In addition, it is also possible to listen to the 
tape many times repeatedly. In the study of a foreign language, 
however, there are some portions that cannot be heard even in 
a case where those portions are listened to many times. In such 
a case there is no other method than referring to the text book 
according to the prior art. This is because in many cases some 
portions cannot be heard even if the tape is played slowly when 
a language, for example American English, is vocalized in a 
manner in which a plurality of words is pronounced as one word. 

However, primarily, the following two problems exist 
concerning the manner of understanding the portions that cannot 
be heard on a tape as described above by referring to a text 
book. 

The first problem is the habit of modern Japanese people 
who rely on text due to an extremely low listening ability, 
in comparison with their reading ability, leading to a tendency 
where listening ability does not improve over time. As such, 
it is desirable not to read text as much as possible during 
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listening practice. According to a practice system using a 
conventional style tape (or other media) , however, there is 
no other appropriate method than to read a text book in the 
case of encountering a situation as described above. There are 
only a few kinds of these negative effects in the case where 
it becomes necessary to read a textbook approximately once 
every 5 or 10 minutes; on the other hand, such materials may 
be too easy for that person so as not to provide good practice 
time. In the case where teaching materials appropriate for 
practicing are selected, many portions that cannot be heard 
are encountered. Reading the textbook at such times prevents 
the person from eliminating the bad habit of relying on reading 
text . 

The second problem is the trouble of reading a textbook 
whenever it is necessary to do so. In particular, it is 
desirable to study without reading a textbook when practicing 
one's listening skills on a train or the like. 

Therefore, the two systems described below can be considered 
in the case wherein the problem described above of referring 
to a textbook is attempted to be solved according to the prior 
art. 

The first system is to sequentially record the English speech 
sound data at a natural speed, spoken by a native speaker and 
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English speech sound data of the same content spoken slowly 
by separating the data word by word so that it is easy for a 
Japanese person to listen to; and to sequentially record 
additional Japanese speech sound data for explaining this 
English speech sound data in Japanese if necessary. This is 
described concretely as follows: first "It's not much of a 
problem. I'd second that." is recorded at a length of 1.8 
seconds at a native speaker speed and then "It is not much of 
a problem. I would second that." is recorded at a length of 
5.0 seconds at a slower speed; and next the following is 
recorded at a length of 37 seconds in Japanese: " 'It' indicates 
a preceding 'something' and 'not much of a problem' is an idiom 
meaning 'no problem' or ^no worries.' 'I'd' is the abbreviated 
version of 'I would' while 'I could' may be abbreviated to 'I'd' 
in the same manner. The meaning of the word 'second' is to 
'indicate' or to 'agree,' so that 'I'd second that' means 'I 
agree with that.'" 

The above is sequentially recorded and therefore is 
sequentially reproduced unless the operator skips or fast 
forwards to different parts of the media. 

The second system is to record the English speech sound data 
as described above spoken by a native speaker at a natural speed 
on, for example, the first track of the tape; then to record 
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the English speech sound data spoken slowly by separating the 
data word by word on the second track; and finally to record 
additional Japanese explanation on another track if necessary. 
According to this system speech sound data of a desired content 
can be reproduced by designating the track to be reproduced.. 
[Problems to be Solved by the Invention] 

There are novices and experts from among people who use 
English conversation tapes. Therefore, the following problem 
arises when using the first system described above. That is, 
to say, for the experts the primary purpose is to listen to 
English spoken by a native speaker and therefore it becomes 
troublesome to listen to English spoken slowly and the Japanese 
explanation because it interrupts the English spoken by a 
native speaker. Thus, though one tries to select and reproduce 
only the parts of English spoken by a native speaker it is not 
easy to find them due to the sequential recording leading to 
the necessity of complicated switching operations. 

On the other hand, English spoken by a native speaker is 
too fast to be listened to for novices and it becomes necessary 
to listen to slowly spoken English and Japanese explanation 
to be compared to the English spoken by a native speaker. 
However, the details cannot be fully understood when using the 
first system where the sentences are heard sequentially, are 
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long or include a plurality of parts that cannot be heard. 
Therefore, when the parts spoken by a native speaker are 
attempted to be heard in comparison to the parts spoken slowly 
by separating the data word by word and the parts of the Japanese 
explanation, it becomes difficult to find the different parts 
due to the sequential recording, leading to the necessity of 
very complicated switching operations. ^ 

In comparison to this, when using the second system described 
above the respective parts of the speech sound data are recorded 
separately on different tracks as described above and therefore 
it is comparatively easy to sequentially reproduce, for example, 
only the parts spoken by a native speaker. However, it is 
difficult to find the recorded position on the second track 
that corresponds to the part that one could not hear in the 
case where, for example, the part of English spoken by a native 
speaker on the first track could not be heard. Of course, a 
correspondence table between the tape counter number of the 
first track and the tape counter number of the second track 
can be prepared so that in the case where English by a native 
speaker cannot be heard in a certain recorded position of the 
first track, the tape can be fast-forwarded or rewound to the 
corresponding recorded position of the second track according 
to the above described correspondence table and one can listen 
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to slowly spoken English. In addition, it is not impossible 
to automate this process. However, there still are some 
drawbacks such as the length of time needed for fast-forwarding 
or rewinding the tape. 

Therefore, a conventional data recording medium or data 
reproduction system for self-study learners of English 
conversation cannot be provided in a manner where experts and 
novices are both satisfied. Similar problems arise in materials 
for practicing recitation of Chinese poems and the study of 
law or other such subjects, in addition to the self-study of 
a language. 

Thus, an object of the present invention is to provide a 
data recording medium and a data reproduction system that 
provide appropriate speech sound data, etc. , for both experts, 
that is to say, those who have mastered a field of study, and 
novices, that is to say, those who have just begun study, so 
that they both can be satisfied. 
[Means for Solving Problems] 

The first mode of a data recording medium according to the 
present invention is a medium (for example a CD-ROM, or DAT) 
in which speech sound data is recorded and is characterized 
by being provided with at least the following three regions. 
That is, to say that, at the least, three regions are provided 
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as follows: 

the first region for recording a first speech sound data 
sequence (for example, English speech sound data sequence 
spoken by a native speaker) divided into a plurality of 
sections; 

the second region for recording a second speech sound data 
sequence (for example, English speech sound data sequence 
spoken by separating the data word by word) which has contents 
corresponding to the first speech sound data sequence, is made 
up of different speech sound data and is divided into a 
plurality of sections corresponding to the plurality of 
sections of the first speech sound data sequence; and 

the third region for recording, as a directory for example, 
recorded position identification data which indicates the 
respective recorded positions of the first and second speech 
sound data sequences in the medium using each of the sections. 

The second mode of the data recording medium according to 
the present invention is a medium (for example a CD-ROM or DAT) 
in which speech sound data is recorded and is characterized 
by being provided with at least the following four regions. 
That is, to say that, at the least, four regions are provided 
as follows: 

the first step of recording a first speech sound data 
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sequence (for example English speech sound data sequence spoken 
by a native speaker) divided into a plurality of sections; 

the second region for recording a second speech sound data, 
sequence (for example, English speech sound data sequence 
spoken by separating the data word by word) which has contents 
corresponding to the first speech sound data sequence, is made 
up of different speech sound data and is divided into a 
plurality of sections corresponding to the sections of the 
first speech sound data sequence; 

the third region for recording a third speech sound data 
sequence (for example, a speech sound data sequence explained 
in Japanese) which has contents corresponding to the first, and 
second speech sound data sequences, is made up of different 
speech sound data and is divided into a plurality of section 
groups, each group including one or more sections of the first 
and second speech sound data sequences put together 
collectively; and 

the fourth region for recording, as a directory for example, 
recorded position identification data which indicates the 
recorded positions of the first and second speech sound data 
sequences, respectively, using each of the sections and for 
recording, as a directory for example, recorded position 
identification data which indicates the recorded positions of 



/ 
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the third speech sound data sequence using each of the section 
groups . 

The third mode of the data recording medium according to 
the present invention is a medium (for example a CD-ROM or DAT) 
in which speech sound data is recorded and is characterized 
by being provided with at least the following four regions. 
That is, to say that, at the least, four regions are provided 
as follows: 

the first region for recording a first speech sound data 
sequence (for example English speech sound data sequence spoken 
by a native speaker) divided into a plurality of sections; 

the second region for recording a second speech sound data 
sequence (for example, English speech sound data sequence 
spoken by separating the data word by word) which has contents 
corresponding to the first speech sound data sequence, is made 
up of different speech sound data and is divided into a 
plurality of sections corresponding to the sections of the 
first speech sound data sequence; 

the third region for recording a text data sequence which 
is made up of text data having contents corresponding to the 
first or second speech sound data sequence which is divided 
into a plurality of sections corresponding to the sections of 
the first speech sound data sequence; and 
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the fourth region for recording, as a directory for example, 
recorded position identification data which indicates the 
recorded positions of the first and second speech sound data 
sequences as well as the text data sequence, respectively, 
using each of the sections. 

In addition, the first mode of the data reproduction system 
according to the present invention is a data reproduction 
system for reproducing speech sound data that has been recorded 
on a medium (for example, a CD-ROM, or a DAT) in advance, and 
is characterized in that the speech sound data includes at least 
the following three types of data, and in that the data 
reproduction system is provided with the below described two 
steps. That is to say, 

the speech sound data at least includes: a first speech sound 
data sequence (for example, English speech sound data sequence 
spoken by a native speaker) divided into a plurality of 
sections; a second speech sound data sequence (for example, 
English speech sound data sequence spoken by separating the 
data word by word) which has contents corresponding to the first 
speech sound data sequence, is made up of different speech sound 
data and is divided into a plurality of sections corresponding 
to the sections of the first speech sound data sequences; and 
recorded position identification data (for example, data 
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recorded as a directory) which indicates recorded positions 
of the first and second speech sound data sequences, 
respectively, in the medium using each of said sections, and 

the data reproduction system is provided with at least the 
two steps as follows: 

the first step of reading out speech sound data of a section 
in the second speech sound data sequence corresponding to the 
section in the first speech sound data sequence during 
reproduction or after an interruption from the medium based 
on the recorded position identification data when a 
reproduction instruction of the second speech sound data 
sequence is inputted during the reproduction, or after an 
interruption, of the first speech sound data sequence, and of 
reproducing the read out speech sound data; and 

the second step of reading out speech sound data of a section 
in the first speech sound data sequence corresponding to the 
section in the second speech sound data sequence during 
reproduction from the medium based on the recorded position 
identification data when a reproduction instruction of the 
first speech sound data sequence is inputted during the 
reproduction, or after an interruption, of the second speech 
sound data sequence, and of reproducing the read out speech 
sound data. 
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The second mode of the data reproduction system according 
to the present invention is a data reproduction system for 
reproducing speech sound data that has been recorded on a medium 
(for example, a CD-ROM, or a DAT) in advance, and is 
characterized in that the speech sound data includes at least 
the following four types of data, and in that the data 
reproduction system is provided with the below described three 
steps. That is to say, 

the speech sound data at least includes: a first speech sound 
data sequence (for example, English speech sound data sequence 
spoken by a native speaker) divided into a plurality of 
sections; a second speech sound data sequence (for example, 
English speech sound data sequence spoken by separating the 
data word by word) which has contents corresponding to the first 
speech sound data sequence, is made up of different speech sound 
data and is divided into a plurality of sections corresponding 
to the sections of the first speech sound data sequence; a third 
speech sound data sequence (for example, a speech sound data 
sequence explained in Japanese) which has contents 
corresponding to the first and second speech sound data 
sequences, is made up of different speech sound data and is 
divided into a plurality of section groups, each group made 
up of one, or more, sections from among the above described 
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sections; and recorded position identification data (for 
example, data recorded as a directory) which indicates the 
recorded positions of the first and second speech sound data 
sequences, respectively, in the medium using each of the 
sections and which indicates the recorded position of the third 
speech sound data sequence in the medium using each of the 
section groups, and 

the data reproduction system is provide with at least the 
three steps as follows: 

the first step of reading out speech sound data of a section 
in the second speech sound data sequence corresponding to the 
section in the first speech sound data sequence during 
reproduction from the medium based on the recorded position 
identification data when a reproduction instruction of the 
second speech sound data sequence is inputted during the 
reproduction, or after an interruption, of the first speech 
sound data sequence, and of reproducing the read out speech 
sound data; 

the second step of reading out speech sound data of a section 
in the first speech sound data sequence corresponding to the 
section in the second speech sound data sequence during 
reproduction from the medium based on the recorded position 
identification data when a reproduction instruction of the 
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first speech sound data sequence is inputted during the 
reproduction, or after an interruption, of the second speech 
sound data sequence, and of reproducing the read out speech 
sound data; and 

the third step of reading out speech sound data of a section 
group in the third speech sound data sequence corresponding 
to the section in the first or second speech sound data sequence 
during reproduction from the medium based on the recorded 
position identification data when a reproduction instruction 
of the third speech sound data sequence is inputted during the 
reproduction, or after an interruption, of the first or second 
speech sound data sequence and of reproducing the read out 
speech sound data. 

The third mode of the data reproduction system according 
to the present invention is a data reproduction system for 
reproducing speech sound data and text data that have been 
recorded on a medium (for example, a CD-ROM, or a DAT) in advance, 
and is characterized in that the speech sound data and text 
data include at least the following four types of data, and 
in that the data reproduction system is provided with the below 
described three steps. That is to say, 

the speech sound data and text data include at least : a first 
speech sound data sequence (for example, English speech sound 
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data sequence spoken by a native speaker) divided into a 
plurality of sections; a second speech sound data sequence (for 
example, English speech sound data sequence spoken by 
separating the data word by word) which has contents 
corresponding to the' first speech sound data sequence, is made 
up of different speech sound data and is divided into a 
plurality of sections corresponding to the sections of the 
first speech sound data sequence; a text data sequence divided 
into a plurality of sections corresponding to the sections of 
the first speech sound data sequence which is made up of text 
data having contents corresponding to the first or second 
speech sound data sequence; and recorded position 
identification data (for example, data recorded as a directory) 
indicating the recorded positions of the first and second 
speech sound data sequences as well as the text data sequence, 
respectively, in the medium using each of the sections, and 

the data reproduction system is provided with at least three 
steps as follows: 

the first step of reading out speech sound data of a section 
in the second speech sound data sequence corresponding to the 
section in the first speech sound data sequence during 
reproduction from the medium based on the recorded position 
identification data when a reproduction instruction of the 
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second speech sound data sequence is inputted during the 
reproduction, or after an interruption, of the first speech 
sound data sequence, and of reproducing the read out speech 
sound data; 

the second step of reading out speech sound data of a section 
in the first speech sound data sequence corresponding to the 
section in the second speech sound data sequence during 
reproduction from the medium based on the recorded position 
identification data when a reproduction instruction of the 
first speech sound data sequence is inputted during the 
reproduction, or after an interruption, of the second speech 
sound data sequence, and of reproducing the read out speech 
sound data; and 

the third step of reading out text data of the corresponding 
section in the text data sequence from the medium based on the 
recorded position identification data during the reproduction 
of, or after an interruption of, the first or second speech 
sound data sequence, and of displaying the read out text data 
on an LCD, or the like. 
[Action] 

The first mode of the data recording medium according to 
the present invention is configured as described above and 
therefore the first region for recording the first speech sound 
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data sequence serves to record the basic speech sound data by 
dividing the first speech sound data into a plurality of 
sections based on, for example, breaks concerning 
pronunciation, breaks concerning some linguistic elements or 
without having particular relationships with these breaks; the 
second region for recording the second speech sound data 
sequence serves to record speech sound data that has been, for 
example, paraphrased from the speech sound data of the first 
speech sound data sequence by dividing the second speech sound 
data into a plurality of sections; the third step of recording 
the recorded position identification data serves to identify 
the recorded positions of the above described two data 
sequences in the medium using each of the sections. 

The second mode of the data recording medium according to 
the present invention is configured as described above and 
therefore the first region for recording the first speech sound 
data sequence serves to record the basic speech sound data by 
dividing the first speech sound data into a plurality of 
sections based on, for example, breaks concerning 
pronunciation, breaks concerning some linguistic elements or 
without having particular relationships with these breaks; the 
second region for recording the second speech sound data 
sequence serves to record speech sound data that has been, for 
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example, paraphrased from the speech sound data of the first 
-speech sound data sequence by dividing the second speech sound 
data into a plurality of sections; the third region for 
recording the third speech sound data sequence serves to record 
speech sound data for individual explanations or a unified 
explanation, or the like, of a plurality of sections of the 
first and second speech sound data sequences; and the fourth 
region for a recording the recorded position identification 
data serves to identify recorded positions of the above 
described three data sequences in the medium using each of the 
sections or each of the section groups. 

The third mode of the data recording medium according to 
the present invention is configured as described above and 
therefore the first region for recording the first speech sound 
data sequence serves to record the basic speech sound data by 
dividing the first speech sound data into a plurality of 
sections based on, for example, breaks concerning 
pronunciation, breaks concerning some linguistic elements or 
without having particular relationships with these breaks; the 
second region for recording the second speech sound data 
sequence serves to record speech sound data that has been, for 
example, paraphrased from the speech sound data of the first 
speech sound data sequence by dividing the second speech sound 
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data into a plurality of sections; the third region for 
recording a text data sequence serves to record text data, 
corresponding to the first or second speech sound data sequence 
by dividing the text data into a plurality of sections; and 
the fourth region for recording the recorded position 
identification data serves to identify the recorded positions 
of the above described three data sequences in the medium using 
each of the sections. 

In addition, the first mode of the data reproduction system 
according to the present invention is configured as described 
above and, therefore, the first and second steps serve to switch 
the reproduction of a speech sound data sequence into that of 
the corresponding section of another speech sound data sequence 
that has been instructed, wherein the recorded position 
identification data serves to indicate the recorded position 
of the switched data sequence using each of the sections at 
the time of the above described switch. 

The second mode of the data reproduction system according 
to the present invention is configured as described above and, 
therefore, the first, second, and third steps serve to switch 
the reproduction of a speech sound data sequence into that of 
the corresponding section or section group of another speech 
sound data sequence that has been instructed, wherein the 
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recorded position identification data serves to indicate the 
recorded position of the switched data sequence using each of 
the sections, or each of the section groups, at the time of 
the above described switch. 

The third mode of the data reproduction system according 
to the present invention is configured as described above and, 
therefore, the first and second steps serve to switch the 
reproduction of a speech sound data sequence into that of the 
corresponding section of another speech sound data sequence 
that has been instructed and the fourth step serves to display 
the text data in the section corresponding to the speech sound 
data during reproduction, wherein the recorded position 
identification data serves to indicate the recorded position 
of the data sequence that is displayed or to which the 
reproduction is switched at the time of this displaying and 
of this switching of the reproduction. 
[Preferred Embodiment] 

In the following, the basic contents of the above described 
six modes of the present invention are respectively described 
prior to concrete descriptions of the embodiments of the 
present invention. Here, the respective constituent features 
described in the claims are, of course, not limited by the basic 
contents. 
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The constituent features of the first mode of the data 
recording medium according to the present invention are 
clarified in Claim 1. A recording medium of this mode is 
characterized by recording at least two speech sound data 
sequences. That is to say, the first speech sound data sequence 
is, for example, made up of English speech sound data spoken 
by a native speaker at a natural speed and this speech sound 
data sequence is divided into a plurality of sections (so, called 
segments) . Though the second speech sound data sequence 
corresponds to the contents of the above described first speech 
sound data sequence, it is made up of. different speech sound 
data and is, for example, English speech sound data spoken at 
a slower speed by separating the data word by word. 

The important thing here is that the above described first 
and second speech sound data sequences are divided respectively 
into pluralities of sections so that the respective sections 
correspond to each other. In the case wherein the t f th section 
of the first speech sound data sequence is "It's" spoken by 
a native speaker , for example, the t'th section of the second 
speech sound data sequence becomes "It is" spoken by separating 
the data word by word. Accordingly, the sections in the first 
and second speech sound data sequences mean segments divided 
based on breaks concerning pronunciation or concerning some 
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linguistic elements. Here, the formation of sections based on 
such breaks is not essential to the present invention, but 
rather, the sections may be based on time intervals irrelevant 
of the above described breaks as shown in the below example 
using a DAT . In addition, the expression "having contents 
corresponding to the second speech sound data sequence and 
being made up of different speech sound data" can be stated 
in other words where the second speech sound data sequence has 
the same meaning as the first speech sound data sequence as 
long as the language is related and has a different 
pronunciation from the first speech sound data sequence. 

Furthermore, the recording medium of this mode is 
characterized by recording recorded position identification 
data. Thus, this recorded position identification data 
functions so as to indicate at which position in the medium 
the speech sound data of the first and second speech sound data 
sequences is recorded. Accordingly, the position in the medium 
in which "It is" in the second speech sound data sequence 
corresponding to "It' s" of the t 1 th section in the first speech 
sound data sequence, for example, is recorded can be recognized 
according to the above described recorded position 
identification data. 

As a result, it can be understood that the first and second 
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speech sound data sequences and the recorded position 
identification data are not recorded irrelevant of each other 
but rather are recorded to maintain a certain relationship so 
that the respective pieces of data are combined in an organic 
manner using the sections as units. That is to say, the first 
and second speech sound data sequences become a pair with each 
other wherein the recorded position identification data makes 
these sequences correspond to each other by the respective 
sections. Here, the recording medium of this mode is the most 
basic mode of the present invention. 

The constituent features of the second mode of the data 
recording medium according to the present invention are 
clarified in Claim 7. Thus, this mode differs from the above 
described first mode in the point wherein the third speech sound 
data sequence is recorded on the medium in addition to the first 
and second speech sound data sequences according to this mode. 

The important thing here is that the third speech sound data 
sequence is divided into section groups wherein each section 
group is made up of one, or more, of the sections of the first 
and second speech sound data sequences. In other words, one 
section group of the third speech sound data sequence includes 
one, or more, of the sections of the first and second speech 
•sound data sequences and accordingly one section group pairs 
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up with one or more sections. 

Furthermore, a recording medium of this mode has an effect 
wherein the recorded position identification data indicates 
the recorded positions of the contents of the third speech sound 
data sequence by using each of the section groups . Accordingly, 
the first, second and third speech sound data sequences as well 
as the recorded position identification data are recorded on 
the medium so as to maintain a certain relationship with each 
other wherein the respective pieces of the data are combined 
in an organic manner using the sections and the section groups 
as units - 

The constituent features of the third mode of the data 
recording medium according to the present invention are 
clarified in Claim 13. Thus, this mode differs from the above 
described second mode in the point wherein a text data sequence 
is recorded on the medium in addition to the first and second 
speech sound data sequences according to this mode. This text 
data sequence is made up of text data having contents 
corresponding to the first and second speech sound data 
sequences and represents English spoken by a native speaker 
with text, for example. 

This text data sequence is also divided into sections 
corresponding to the section of the first and second speech 
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sound data, sequences. In addition, the recorded position 
identification data functions such that the recorded position 
of this text data sequence is identified using each of the 
sections. Accordingly, the first and second speech sound data 
sequences and the text data sequence correspond to each other 
using the sections as units. Here, in the case wherein the third 
speech sound data sequence according to the second mode is added 
as recorded data to the recording system of this mode, one or 
more sections of the first and second speech sound data 
sequences as well as the text data sequence correspond to one 
section group of the third speech sound data sequence. 

The modes of the data recording medium according to the 
present invention are described above. In comparison to this, 
the data reproduction system according to the present invention 
has the following three modes and these modes are compared to 
the three modes of the data recording medium respectively shown 
above . 

The constituent features of the first mode of the data 
reproduction system according to the present ' invention are 
shown in Claim 19. The reproductive system of this mode 
presupposes that speech sound data according to the first mode 
of the recording medium has been recorded on the medium in 
advance. Furthermore, it is characterized by being provided 
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with at least the following two steps. 

The first step is switching from the reproduction of the 
first speech sound data sequence to the second speech sound 
data sequence. Thus, this switching is carried out using the 
sections as units. In the case wherein a reproduction 
instruction of the second speech sound data sequence is 
inputted when the t ! th section of the first speech sound data 
sequence is being reproduced, for example, the t 1 th section 
of the second speech sound data sequence is read out based on 
the recorded position identification data so as to reproduce 
the speech sound data thereof. The second step is switching 
from the reproduction of the second speech sound data sequence 
to the first speech sound data sequence and this switching is 
also carried out according to the sections in the same manner 
as the first step. 

Here, the first and second steps are not limited to the above 
description, but rather, a variety of modifications thereof 
are possible. The so called return instruction is a 
representative example. That is to say, in the case wherein 
a return instruction is inputted after the reproduction is 
temporarily interrupted due to a stop instruction during the 
reproduction, the readout position of the speech sound data 
is shifted back by the instructed amount and thereby 
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reproduction of the speech sound data is carried out in the 
desired manner. 

The constituent features of the second mode of the data 
reproduction system according to the present invention are 
shown in Claim 22. Thus, this mode differs from the above 
described first mode in the point that this mode presupposes 
that the speech sound data according to the second mode of the 
recording medium has been recorded on the medium in advance 
and that this mode is provided with the third step of switching 
from the reproduction of the first or second speech sound data 
sequence to the third speech sound data sequence. 

In the case when "It's" by a native speaker cannot be heard 
during the reproduction of the first speech sound data sequence, 
for example, the reproduction is switched to the second speech 
sound data sequence according to the first step so that "It 
is" spoken slowly by separating the data word by word can be 
heard. Thus, in the case where the meaning and the grammar of 
this are desired to be known in Japanese, the reproduction may 
be switched to the third speech sound data sequence according 
to the third step. 

The system of this mode can, of course, be modified so. that 
a return instruction and a stop instruction as described in 
the above first mode can be combined and used. 
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The constituent features of the third mode of the data 
reproduction system according to the present invention are 
shown in Claim 26. Thus, this mode differs from the above 
described second mode in the point that this mode presupposes 
that the speech sound data and the text data according to the 
third mode of the recording medium have been recorded on the 
medium in advance and the text data sequence is displayed during 
the reproduction of the first or second speech sound data 
sequence according to this mode. 

In the case where "It's" of the first speech sound data 
sequence is reproduced, for example, "It's" or "It is" is 
displayed in a predetermined display part. Here, it is not 
necessary for this display to completely synchronize with the 
speech sound data during reproduction chronologically and text 
may be displayed with a slight delay or may be displayed 
slightly ahead of time. 

Next, several embodiments of the present invention are 
concretely described in reference to the attached drawings. 

First, the first example of a recording medium according 
to the present invention is described in reference to Fig. 1 
through Fig. 3. Fig. 1 is a view showing data sequences A, B 
and C as well as the recorded contents thereof where a recording 
medium of the present invention is applied for self-study of 
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English conversation. In this figure, data sequence A indicates 
an English speech sound data sequence (first speech sound data 
sequence) spoken by a native speaker and is divided into a 
plurality- of segments 621 to 627 indicating the sections 
thereof. Data sequence B indicates an English speech sound data 
sequence (second speech sound data sequence) spoken slowly by 
separating the data word by word and this is speech sound data 
of sections made up of English words and phrases corresponding 
to segments 621 to 627. Data sequence C indicates a speech sound 
data sequence (third speech sound data sequence) for Japanese 
explanation and corresponds to segments 621 to 624 and 625 to 
627 of data sequences A and B, respectively. 

Fig. 2 is a view showing the relationship between time and 
capacity of each segment in the example shown in Fig. 1. In 
this figure, one second corresponds to a capacity of 6 kilobytes . 
It takes 0.4 seconds for "much of a" of segment 623 to be spoken 
at a native speaking speed, for example, and therefore, the 
recording capacity used in the medium is 2.4 kilobytes. 

Fig. 3 is a view showing one example of a directory in the 
example shown in Fig. 1 and Fig. 2. In this figure, one segment 
of the directory is formed of 9 * 3 = 27 bytes. Data sequences 
A, B and C correspond data sequences A, B and C of Fig. 1 
respectively. In addition, C of one byte indicates an attribute 
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and when this is 129, that is to say "10000001" according to 
the bit expression (8 bits), this indicates that it has the 
same contents as the contents of the previous data sequence. 

Position data M, S and B (one byte each) is a parameter 
showing the positions in a CD-ROM that has been standardized 
in the industry. That is to say M indicates minutes, S indicates 
seconds, and B indicates blocks, respectively, wherein one 
block is made up of 2048 bytes. Thus, 75 blocks form an amount 
for one second. Accordingly, the maximum numbers are M = 59, 
S = 59, and B = 74. SB of the next two bytes indicates the start 
byte and LLL of the following three bytes indicates the byte 
length. Here, minutes and seconds are used as parameters 
indicating position, because the CD-ROM was originally 
developed for music recording, and therefore, the recorded 
position is represented as a length of a time from the start. 
Thus, in the case of a CD-ROM, minutes and seconds are not at 
all related to the period of time of reproduction, but rather 
this data simply represents the recorded position. 

As a result, as for "much of a" of segment 623, for example, 
English spoken -by a native speaker is recorded starting from 
the 1354 th byte of the 0 minutes 11 seconds 42 blocks with a 
length of 2400 bytes, English spoken slowly by separating the 
data word by word is recorded starting from the 1706 th byte of 
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the 0 minutes 11 seconds 43 blocks with a length of 7200 bytes 
and Japanese explanation is recorded starting from the 1282 nd 
byte of 0 minutes 11 seconds 6 blocks with a length of 72000 
bytes. Here, segment numbers 621 to 627 are not in the memory 
but correspond to the addresses thereof. 

Next, effects of the recording medium shown in the above 
described Fig. 1 through Fig. 3 are described. 

As shown in Fig. 3, the region from the 82 6 th byte to the 
(826 + 1200 - 1 =) 2025 th byte at 0 minutes 11 seconds 3 blocks 
in the medium serves to record the data sequence wherein the 
segment is 621 and attribute C is 0, that is to say, "It's" 
spoken by a native speaker. In addition, the region starting 
from the 2026 th byte to . the (2026 + 5400 - 1 =) 7425 th byte at 
0 minutes 11 seconds 3 blocks in the medium, that is to say, 
to the 1282 nd byte at 0 minutes 11 seconds 6 blocks serves to 
record the data sequence wherein the segment is 621 and 
attribute C is 64, that is to say, "It is" spoken slowly by 
separating the data word by word. In addition, the region from 
the 1282 nd byte at 0 minutes 11 seconds 6 blocks to 1601 st byte 
at 0 minutes 11 seconds 41 blocks in the medium serves to record 
the data sequence wherein the segment is 621 and attribute C 
is 128, that is to say, the Japanese explanation. 

As described above, in the case where a directory as shown 
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in Fig. 3 is formed, the data sequence shown in Fig. 1 can be 
recorded to have the capacity and reproduction period of time 
as shown in Fig. 2. 

Next, a reproduction system for the recording medium 
described in reference to Fig. 1 through Fig. 3 is described 
in reference to Fig. 4 through Fig. 9. 

Fig. 4 is a block diagram showing the configuration of one 
example of a reproduction unit. In this figure, a medium 1 is, 
for example, a CD-ROM and is set in a reproduction mechanism 
2. Reproduction mechanism 2 is connected to a CPU 5 via a disk 
interface (I/F) 3 and a bus 4. In addition, a ROM 6 of 32 
kilobytes, for example, for storing a program and a RAM 7 of 
256 kilobytes, for example, for temporarily storing a directory 
or a speech sound data sequence, are connected to bus 4. 
Furthermore, bus 4 is connected to a handset interface (I/F) 
9 for sending and receiving data to and from a handset 8 for 
manual operation, and is connected to a D/A converter 12 that 
is connected to an external terminal 11 and handset 8 via an 
AMP 10. Here, an earphone 13 is connected to handset 8. 

Fig. 5(a) and Fig. 5(b) are diagrams showing memory 
allocations of ROM 6 and RAM 7 . As shown in Fig. 5(a), a program 
of 32 kilobytes is stored in ROM 6. As shown in Fig. 5(b) , RAM 
7 is allocated for a buffer of 50 blocks for (50 + 50 =) 100 
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kilobytes, a directory for (75 + 75 =) 150 kilobytes and a system 
area for 6 kilobytes. Accordingly, .a speech sound data sequence 
of 50 blocks is always stored in RAM 7 and a directory (only 
the portion of . data sequence A takes up approximately 30 
minutes) for 150 kilobytes + 27 bytes = 5555^ segments is also 
stored in RAM 7. 

Here, a CD-ROM is used as the medium in the above described 
example and the capacity of a representative CD-ROM is 552 
megabytes. Units of minutes, seconds, and blocks are used to 
represent the address in the CD-ROM. As described above, one 
block is formed of 2048 bytes, 75 blocks form one second and 
60 seconds form one minute while the CD-ROM holds the maximum 
of 59 minutes 59 seconds 74 blocks. Accordingly, the maximum 
storage amount becomes (2048 * 75 * 60 * 60 =) 552 . 96 megabytes . 
The first two seconds out of the above maximum storage amount 
are used, for formatting the CD-ROM and cannot be used by the 
user and thereby, the maximum capacity becomes 552.6528 MB to 
be precise. In the case wherein a directory is allocated to 
a portion corresponding to the first 20 seconds out of the 
maximum capacity, a directory capacity of 3 megabytes can be 
secured in the CD-ROM. 

Next, a calculation example concerning the capacity is 
shown. 
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A sound sample is produced consisting of 16 kilo- 
samples/second according to an ADPCM system and 3 bits are used 
for one sample. This leads to 48 kilobits/second and 
accordingly, 6 kilobytes/second, and therefore, it is 
necessary for the transfer rate to be adjusted to this rate. 
Here, in the case of 16 kilo-samples/second, f characteristics 
up to 8 KHz are exhibited. Accordingly, consonants can be 
recorded sufficiently. 

According to the above described presumption, a capacity 
of (6 kilobytes * 3600 seconds =) 21.6 megabytes is required 
for recording sound for one hour. In general, 552 megabytes 
are recorded on one CD-ROM, including error corrections. The 
CD-ROM, excluding the directory portion, can store 549 
megabytes of speech sound data. Accordingly, it becomes 
possible to record speech sound data for (549 + 21.6 =) 25 hours 
24 minutes. Thus, in the case where the CD-ROM is used for the 
study of English conversation, supposing that a one hour story 
spoken by a native speaker at a natural speed is provided, the 
portion of the story that is spoken slowly by separating the 
data word by word would take four hours, four times longer than 
the natural speed. Thus, the total hours become 20 hours even 
in the case where 15 total hours are required for the 
explanation portion. 
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Next, the method of dividing the one hour conversation 
portion into several sections (segments) is considered and then 
presuming that each second is made up of four segments on 
average, the one hour would be divided into 14400 segments. 

27 bytes are required for a directory for one segment, and 
therefore, approximately 389 kilobytes are required in total 
which can be sufficiently stored in the above described storage 
place for directories of 3 megabytes, and therefore, all of 
the directories for the one hour story can be stored. 

Fig. 6 is a detailed illustration of handset 8 and earphone 
13 shown in Fig. 4. The front side- of handset 8 is provided 
with a segment display part 21 for displaying the segment number, 
an FF switch 22 for fast-forwarding, A, B and C instruction 
switches 23A, 23B and 23C for instructing the respective 
reproductions of speech sound data sequences A, B and C, a STOP 
button 24 for instructing the stoppage of the reproduction, 
and a REV button 25 for instructing the return of reproduction. 
In addition, earphone 13 is connected to handset 8 via a cord 
26 and handset 8 is connected to the body via a cord 27. 

Fig. 7(a) is a perspective view of the above described 
reproduction unit and Fig. 7(b) is a perspective view of this 
reproduction unit being contained in a case 29 to which a strap 

28 is attached. The CD-ROM has been miniaturized and reduced 



-49- 




in weight as a result of the recent progress in technology, 
and therefore, can be made portable as shown in the figures. 
Fig. 8 shows the condition where the handset is hand held. 

Next, the effects of the present embodiment are described 
in reference to one example of the reproduction sequence 
according to the reproduction system above described. Fig. 9 
is the sequence diagram thereof. Fig. 9(a) shows the flow of 
data when the portion of data sequence A, that is to say, the 
native speaker English, is continuously heard. At this time 
button' 2 3A in Fig. 6 is pressed so that the reproduced sound 
is heard as is. 

Next, the sequence of Fig. 9(b) is described. First, data 
sequence A of segment 621 is reproduced. Here, when "much of 
a" of segment 623 cannot be heard well, STOP button 24 is 
immediately pressed. At this time, the segment number of 
segment display part 21 has become 624. Thus, REV button 25 
is pressed only once. Here, when REV button 25 is held down 
the segment continuously moves backwards and every time REV 
button 25 is pressed the segment of display part 21 moves 
backwards by one. When REV button 25 is pressed once under the 
condition where the segment is 624, the segment of display part 
21 becomes 623. 

Next," when switch 23B is pressed "much/ of / a" is heard spoken 
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slowly by separating the data word by word. When the unit is 
left as is, it proceeds to "problem" through data sequence B. 
When "much/ of / a" is heard, STOP button 24 is pressed and REV 
button 25 is again pressed so that the segment becomes 623. 
Then, when switch 23C is pressed the Japanese explanation data 
sequence C can be heard. 

Some portions of the explanation relate to phrases made up 
of several words and therefore several segment^ numbers may be 
covered by one explanation. When the lowest level bit of the 
byte showing attribute C is 1, this segment (where attribute 
C is 129) shows the attribute with a meaning of the same content 
as the previous segment number of the same data sequence. When 
such a segment is encountered, it can be skipped. 

Next, the second example of a recording medium and 
reproduction system according to the present invention is 
described in reference to Fig. 10 through Fig. 12. Fig. 10 is 
a view showing directly wherein one segment is formed of 12 
bytes. That is to say, attribute C of one byte, segment number 
SS of two bytes, suffix number N of one byte, minute M of one 
byte, second S of one byte, block B of one byte, start byte 
SB of two bytes and byte length LLL of three bytes add up to 
12 bytes. 

In addition, when the first bit (top level bit) is 1, it 
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indicates the start of the segment and when the first bit is 
0, it indicates another condition concerning the 8 bits 
attribute C. When the second and third bits are 0, it indicates 
that the segment is of data sequence A; when the second and 
third bits are 1, it indicates that the segment is of data 
sequence B; and when the second and third bits are 2, it 
indicates that the segment is of data sequence C. When the 
fourth bit is 0, it indicates there is no suffix; and when the 
fourth bit is 1, it indicates there is a suffix. When the fifth 
bit is 1, it indicates the segment has the same content as the 
previous segment; and when the fifth bit is 0 the segment does 
not have the same content. 

Fig. 11 shows data sequences A, B, and C being divided into 
segments according to the directory of Fig. 10. Thus, this 
differs from the above described first embodiment in the point 
that the data sequences are divided into segments with suffixes 
in this embodiment. Fig. 12 is a view showing the relationship 
between the values of the second and third bits of attribute 
C which has 8 bits and the segments. 

Next, a recording medium and reproduction system according 
to the third example of the present invention is described in 
reference to Fig. 13 and Fig. 14. Thus, this example differs 
from the first and second examples in the point wherein text 
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data is recorded and reproduced in addition to speech sound 
data. 

Fig. 13 is a view showing the directory of this example. 
Thus this differs from the directory of Fig. 10 in the point 
that data of D = 3 is added to the second and third bits of 
attribute C in this example. When this bit is 3 it indicates 
that text data sequence D is recorded on the medium in the form 
of a code. 

Fig. 14 is a frontal view of a handset used for the 
reproduction unit according to this example . Thus, this differs 
from the handset of Fig. 6 in the point wherein a text display 
part 41 formed of, for example, an LCD and a text display button 
42 for turning on and off the display of text are provided with 
this handset. When text display button 42 is pressed in this 
kind of handset 8 it turns text display part 41 on and off so 
that the text data sequence D is displayed or is not displayed. 

Next, the characteristics of the effects of this third 
embodiment are described. 

The speed of the appearance of text data sequence D is 
controlled in accordance with the speed of data sequence A being 
spoken, that is to say, the length of LLL, during the 
reproduction of speech sound data sequence A. Thus, text begins 
to appear on display part 41 and completely finish appearing 
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during the time from the start of and to the end of the speech 
sound of the above described segment. That is to say, the text 
is displayed in complete synchronization with the speech sound. 

Next, text data sequence, data D is displayed in 
synchronization with the length of the speech sound data 
sequence B during speech sound data sequence B' s reproduction, 
and at this time the speech sound is reproduced for a period 
of time several times longer than the time of reproduction of 
speech sound data sequence A. In practice, it is considered 
to be more convenient for the user if the display of the text 
comes out sooner (the output of speech sound is slightly 
delayed) and then this requires further examination at the time 
when the unit is put into practice. 

Next, a case wherein the data recording medium is a magnetic 
tape that cannot be accessed at random is described as the 
fourth embodiment of the present invention. 

In this example, the basic mode of data recording and 
reproduction is the same as the above described examples and 
reproduction by skipping data on the magnetic tape is not easy. 
Therefore, it is desirable to provide a buffer for temporarily 
restoring data in order to make the reproduction unit practical . 
Concretely, the respective small units corresponding to data 
sequences A, B and C (three tracks) of the above described 



-54- 




virtual tracks are aligned in order so that the corresponding 
units of data sequences A, B and C are put together in this 
manner: A lf B lf C lf A 2 , B 2 , C 2 , A 3/ B 3 , C 3 . . . Thus, when data 
sequence A is reproduced, only the A' s such as A lf A 2 , A 3 . . . 
are chosen for the reproduction and when data sequence B is 
reproduced, only the B' s such as B 1# B 2 , B 2 . . . are chosen for 
the reproduction. The same can be carried out for data sequence 
C. At this time, B 1 and C x must be skipped in order to reproduce 
A 2 next to A x without interruption. This is not easy to do with 
a conventional tape, and therefore data must be brought into 
the buffer ahead of time from the magnetic tape. Thus, the 
skipping process and reproduction are carried out in the 
buffer. 

In the case wherein this magnetic tape is a so-called DAT 
(digital audio tape) the following process occurs. First, 
speech sound of a native speaker recorded on track A is 
separated into certain units as sections. For example, the 
speech sound is separated into units of a constant period of 
time such as one second. On the other hand, track B records 
the contents spoken slowly (for example, at 1/3 the speed of 
track A on average) . Regions are secured on the tape so that 
if a region of 1 second is taken up on track A it corresponds 
to a region of four seconds on track B. In addition, the regions 
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are secured so that a region of 9.4 seconds can be taken up 
on track C for the explanation. The above described number of 
seconds here is an example and the present invention is not 
limited to this. In this example, 14.4 seconds are used as one 
unit. That is to say, three segments of tracks A, B, and C that 
correspond to each other are put together to make up 14.4 
seconds. It should be noted that a portion of one second on 
track A is equivalent to one second that is cut out of the 
original sound data. A portion of four seconds on track B may 
have a content that is within four seconds, but may finish in 
3 seconds, for example, with an extra second remaining. Here, 
a method for skipping this remaining portion (a one second 
portion) at the time of reproduction is described below. 

An example of a rotary head type DAT is cited as follows. 
A system wherein two tracks are recorded or reproduced during 
one rotation of the rotary head is generally used in this 
example. Thus, speech sound data of 2, 880 bytes can be recorded 
on one track. Accordingly, in the case where one segment is 
formed of 30 tracks the segment is made up of 86,400 bytes. 
In the case wherein the speech sound utilized in the above 
described embodiments is recorded in this segment at a sampling 
rate of 48 (Kbit/sec), 86, 400 (bytes + 6,000) bytes = 14.4 sec 
can be recorded. 
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Next, the buffer is described. 

When a RAM of 1 MB is utilized, 12 buffers can be provided 
as a result of this equation: 1,0480,576+ 86,400 = 12,135. The 
12 buffers are arranged to function in a ring form, so that 
two buffers are always selected for the data transfer from the 
tape to these buffers. Reproduction is carried out by taking 
out the data from the buffers. When the reproduction is stopped 
at an arbitrary place, the speech sound (already emitted from 
the speaker as a reproduced sound) before the point of time 
where the reproduction is stopped remains in the ten buffers. 
That is to say, speech sound for ten seconds of track A can 
be repetitively reproduced from the buffers without rewinding 
the tape. 

In such a configuration, the 10 seconds of speech sound data 
that remains in the ten buffers can be easily transferred to 
any of the virtual tracks A, B or C in an arbitrary segment 
in the same manner as the above described embodiment using a 
CD-ROM. 

Here, random access cannot be carried out like a CD-ROM and, 
therefore, the total period of time of the segments of the three 
tracks A, B and C that correspond to each other must always 
be within 14.4 seconds. At this time, the portions of virtual 
track A may be mechanically separated into constant time 
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intervals (for example, one second intervals) . The portions 
of track B may be two seconds or five seconds, for example, 
and therefore variable. A period of time gained by subtracting 
periods of time of tracks A and B from 14 . 4 seconds is the period 
of time that is allocated for track C. Here, it is not necessary 
for both tracks B and C to record the portions that completely 
correspond to the portion in the segment of track A, but rather 
may form a coordinated unit of a number of segments in the 
vicinity. 

A DAT has a sub-code region and therefore data of byte numbers 
indicating the respective segments (for every segment, 14.4 
seconds/ 30 tracks) and interfaces between tracks A, B and C 
in the respective segments can be recorded in this sub-code 
region. 

Next, an amount of speech sound data that can be contained 
in a DAT is concretely calculated. 

In general, a DAT of one hour can record 240,000 tracks. 
One segment is made up of 30 tracks and the following 6 tracks 
are used as pause tracks so that the portions recording speech 
sound data do not become scratched even in the case wherein 
the rotary head passes through a great number of times when 
the tape is stopped. In such a configuration one unit is formed 
of 36 tracks so that 6, 666 units can be recorded in total which 
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becomes 6,666 * 14.4 seconds = 26 hours, 39 minutes and 50 
seconds. One hour out of that is allocated for track A and 
therefore 25 hours, 39 minutes and 50 seconds can be used for 
tracks B and C. Even if four hours are used for track B spoken 
slowly, more than 21 hours can be allocated for track C for 
the explanation and, therefore, the tape has a sufficient 
length. 

Here, as for how the segments of tracks A, B and C are created, 
track A may be divided at places where there are some pauses 
according to speech sound, so that the length of the segments 
become approximately one second, ' instead of dividing track A 
into constant intervals of one second in the same manner as 
in the above described application of a CD-ROM. That is to say, 
the length of the segments may vary according to the edit policy 
at the time of recording. Accordingly, a medium that can be 
randomly accessed, such as a CD-ROM, may have sections created 
by separating the medium into constant time intervals. 

The present invention is not limited to the above described 
embodiments but rather a variety of modifications are possible . 

The present invention can be applied to a system, for example, 
that connects to a video disk or a video tape. That is to say, 
a soundtrack of a movie is placed on track A spoken at regular 
speed (speed of a native speaker) ; speech sound spoken slowly 
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is placed on track B; and Japanese explanation is placed on 
track C. Thus, the reproduction of track A can be stopped at 
a place that cannot be understood and slightly moved back (at 
this time, the image can be held still and at the same time 
the system can be practically used) so that track B can be heard 
and if it cannot still be understood track C can be used. Here, 
the speech sound may be synchronized with the images only at 
the time track A is reproduced. 

In addition, the present invention can be used on a personal 
computer or the like. That is to say, a high level application 
becomes possible by combining with a system connecting a CAI 
(Computer Aided Instruction) or a programmable high level 
system such as a CDI (Compact Disk Interactive) . In addition, 
it is also possible to use the memory device of a personal 
computer in place of the above described CD or DAT in order 
to implement the present invention. 

Furthermore, the present invention can be applied to the 
study of recitation of Chinese poems, the study of law, in 
addition to the study of English conversation. Moreover, data 
sequences are not limited to three types or four types, but 
rather may be of two types or no less than five types. 
[Effects of the Invention] 

As described above in detail, a data recording medium 
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according to the present invention can record speech sound data, 
and the like, having appropriate contents desired by experts 
and novices and, therefore, a data recording medium that can 
satisfy both experts and novices can be gained. 

In addition, according to a data reproduction system in 
accordance with the present invention, speech sound data, and 
the like, having appropriate contents for experts and novices 
can be reproduced by using a recording medium according to the 
present invention and, therefore, a data reproduction system 
that can satisfy both experts and the novices can be gained. 

4. BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is an explanatory view showing data sequences A, B, 
and C and the recorded contents thereof when a recording medium 
of the present invention is applied for the self-study of 
English conversation; Fig. 2 is an explanatory view showing 
the relationship between a period of time and a capacity of 
each segment in the example shown in Fig. 1; Fig. 3 is an 
explanatory view showing one example of a directory of the 
example shown in Fig. 1 and Fig. 2; Fig. 4 is a block diagram 
showing the configuration of one example of a reproduction 
unit; Fig. 5(a) and Fig. 5(b) are diagrams showing memory 
allocations in ROM 6 and in RAM 7; Fig. 6 is an explanatory 
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view showing handset 8 and earphone 13 showing Fig. 4; Fig. 
7 is a perspective view of the reproduction unit shown in Fig. 
4; Fig. 8 is a view showing the condition where the handset 
is hand held; Fig. 9 is a sequence diagram showing the effects; 
Fig. 10 is an explanatory view showing a directory according 
to the second example of the present invention; Fig. 11 is a 
view showing divided segments of data sequences A, B and C 
according to the directory of Fig. 10; Fig. 12 is a view showing 
the relationship between a value of the second and third bits 
and a segment of attribute C of 8 bits; Fig. 13 is an explanatory 
view showing a directory according to the third example of the 
present invention; and Fig. 14 is a frontal view of a handset 
used in the reproduction unit according to this example. 

1 ■■• medium, 2 ••■ reproduction mechanism, 8 •■• handset, 13 
■•■ earphone, 21 ■•■ segment display part, 22 ■■■ FF switch, 23A 
■■• A instruction switch, 23B B instruction switch, 23C ■•■ C 
instruction switch, 24 — STOP button, 25 REV button, 41 
text display part, and 42 •■• text display button 
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